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Errors quoted on results are often given in asymmetric form. An account is given of the two ways these can 
arise in an analysis, and the combination of asymmetric errors is discussed. It is shown that the usual method 
has no basis and is indeed wrong. For asymmetric systematic errors, a consistent method is given, with detailed 
examples. For asymmetric statistical errors a general approach is outlined. 



1. Asymmetric Errors 

In the reporting of results from particle physics ex- 
periments it is common to see values given with er- 
rors with different positive and negative numbers, to 
denote a 68% central confidence region which is not 
symmetric about the central estimate. For example 
(one of many) the Particle Data Group Q quote 

B.i?.(/2(1270) ^ tttt) = {8A.7tli)%. 

The purpose of this note is to describe how such er- 
rors arise and how they can properly be handled, par- 
ticularly when two contributions are combined. Cur- 
rent practice is to combine such errors separately, i.e. 
to add the (7+ values together in quadrature, and then 
do the same thing for the cr~ values. This is not, to 
my knowledge, documented anywhere and, as will be 
shown, is certainly wrong. 

There are two separate sources of asymmetry, which 
unfortunately require different treatments. We call 
these 'statistical' and 'systematic'; the label is fairly 
accurate though not entirely so, and they could 
equally well be called 'frequentist' and 'Bayesian'. 

Asymmetric statistical errors arise when the log 
likelihood curve is not well described by a parabola . 
The one sigma values (or, equivalently, the 68% cen- 
tral confidence level interval limits) are read off the 
points at which In L falls from its peak by 
or, equivalently, when rises by 1. This i 
strictly accurate, and corrections should be made us- 
ing Bartlett functions but that lies beyond the 
scope of this note. 

Asymmetric systematic errors arise when the de- 
pendence of a result on a 'nuisance parameter' is 
non-linear. Because the dependence on such parame- 
ters - theoretical values, experimental calibration con- 
stants, and so forth - is generally complicated, involv- 
ing Monte Carlo simulation, this study generally has 
to be performed by evaluating the result x at the —a 
and +cr values of the nuisance parameter a (see Q for 
a fuller account) giving a~ and cr+. (a ± cr gives 
or cr^f according to the sign of ^.) 

This note summarises a full account of the proce- 
dure for asymmetric systematic errors which can be 
found in Q and describes what has subsequently been 
achieved for asymmetric statistical errors. For another 
critical account see la. 
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2. Asymmetric Systematic Errors 

If a~ and cr^ are different then this is a sign that the 
dependence of x on a is non-linear and the symmetric 
distribution in a gives an asymmetric distribution in x. 
In practice, if the difference is not large, one might be 
well advised to assume a straight line dependence and 
take the error as symmetric, however we will assume 
that this is not a case where this is appropriate. We 
consider cases where a non-linear effect is not small 
enough to be ignored entirely, but not large enough to 
justify a long and intensive investigation. Such cases 
are common enough in practice. 

2.1. Models 

For simplicity we transform a to the variable u de- 
scribed by a unit Gaussian, and work with X{u) = 
x{u) — a;(0). It is useful to define the mean cr, the 
difference a, and the asymmetry A: 

cr = a — A = — : 



. . (1) 

There are infinitely many non-linear relationships be- 
tween u and X that will go through the three deter- 
mined points. We consider two. We make no claim 
that either of these is 'correct'. But working with 
asymmetric errors must involve some model of the 
non-linearity. Practitioners must select one of these 
two models, or some other (to which the same formal- 
ism can be applied), on the basis of their knowledge 
of the problem, their preference and experience. 

• Model 1: Two straight lines 

Two straight lines are drawn, meeting at the 
central value 

X = a+u u>0 

= a^u u < 0. (2) 

• Model 2: A quadratic function 

The parabola through the three points is 

X = (ju + au^ = au + Aau^ . (3) 
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These forms are shown in Figure^for a small asym- 
metry of 0.1, and a larger asymmetry of 0.4. 




Figure 1: Some nonlinear dependencies 

Model 1 is shown as a solid line, and Model 2 is 
dashed. Both go through the 3 specified points. The 
differences between them within the range — 1 < u < 1 
are not large; outside that range they diverge consid- 
erably. 

The distribution in u is a unit Gaussian, G{u), 
and the distribution in X is obtained from P{X) = 
\dx/^du\ • Examples are shown in Figure |21 For Model 
1 (again a solid line) this gives a dimidated Gaus- 
sian - two Gaussians with different standard devia- 
tion for X > and X < 0. This is sometimes called a 
'bifurcated Gaussian', but this is inaccurate. 'Bifur- 
cated' means 'split' in the sense of forked. 'Dimidated' 
means 'cut in half, with the subsidiary meaning of 
'having one part much smaller than the other' 0- 
For Model 2 (dashed) with small asymmetries the 
curve is a distorted Gaussian, given by i "^i"-* i with 
y _ V cr'^ +4aX ^a_ ^ larger asymmetries and/or 

larger \X\ values, the second root also has to be con- 
sidered. 
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It can be seen that the Model 1 dimidated Gaussian 
and Model 2 distorted Gaussian are not dissimilar if 
the asymmetry is small, but are very different if the 
asymmetry is large. 

2.2. Bias 

If a nuisance parameter u is distributed with a 
Gaussian probability distribution, and the quantity 
X{u) is a nonlinear function of u, then the expecta- 
tion {X) is not X{{u)). 

For model 1 one has 



<X>-- 
For model 2 one has 

<X >=- 



2tt 



(4) 



(5) 



Hence in these models, (or any others), if the result 
quoted is ^(0), it is not the mean. It differs from 
it by an amount of the order of the difference in the 
positive and negative errors. It is perhaps defensible 
as a number to quote as the result as it is still the 
median - there is a 50% chance that the true value is 
below it and a 50% chance that it is above. 

2.3. Adding Errors 

If a derived quantity z contains parts from two 
quantities x and y, so that z = x + y, the distribution 
in z is given by the convolution: 



/^(z) = / dxfx{x)fy{z - x) 



(6) 




Figure 2: Probability Density Functions from Figure Q 



Figure 3: Examples of the distributions from combined 
asymmetric errors using Model 1. 

With Model 1 the convolution can be done ana- 
lytically. Some results for typical cases are shown in 
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Figure |3| The solid line shows the convolution, the 
dashed line is obtained by adding the positive and 
negative standard deviations separately in quadrature 
(the 'usual procedure'). The dotted line is described 
later. 

The solid and dashed curves disagree markedly. The 
'usual procedure' curve has a larger skew than the con- 
volution. This is obvious. If two distributions with the 
same asymmetry are added the 'usual procedure' will 
give a distribution just scaled by \/2, with the same 
asymmetry. This violates the Central Limit Theo- 
rem, which says that convoluting identical distribu- 
tions must result in a combined distribution which is 
more Gaussian, and therefore more symmetric, than 
its components. This shows that the 'usual procedure' 
for adding asymmetric errors is inconsistent. 

2.4. A consistent addition technique 

If a distribution for x is described by some function, 
f{x;xQ, cr^, cr~), which is a Gaussian transformed ac- 
cording to Model 1 or Model 2 or anything else, then 
'combination of errors' involves a convolution of two 
such functions according to Equation This com- 
bined function is not necessarily a function of the same 
form: it is a special property of the Gaussian that the 
convolution of two Gaussians gives a third. The (solid 
line) convolution of two dimidated Gaussians is not 
itself a dimidated Gaussian. Figure |31 is a demonstra- 
tion of this. 

Although the form of the function is changed by 
a convolution, some things are preserved. The semi- 
invariant cumulants of Thiele (the coefficients of the 
power series expansion of the log of the Fourier Trans- 
form) add under convolution. The first two of these 
are the usual mean and variance. The third is the 
unnormalised skew: 

7 =< a;^ > -3 < a; >< > +2 < X >^ (7) 

Within the context of any model, a consistent ap- 
proach to the combination of errors is to find the 
mean, variance and skew: /i, V and 7, for each con- 
tributing function separately. Adding these up gives 
the mean, variance and skew of the combined func- 
tion. Working within the model one then determines 
the values of ct_ , (7+ , and xq that give this mean, vari- 
ance and skew. 



+ i(a+-a-)3] (8) 

Given several error contributions the Equations|S|give 
the cumulants /i, V and 7 of each. Adding these up 
gives the first three cumulants of the combined dis- 
tribution. Then one can find the set of parameters 
(7^ , , xq which give these values by using Equa- 
tions |H1 in the other sense. 

It is convenient to work with A, where A is the 
difference between the final xq and the sum of the in- 
dividual ones. The parameter is needed because of 
the bias mentioned earlier. Even though each contri- 
bution may have xq = 0, i.e. it describes a spread 
about the quoted result, it has non-zero fii through 
the bias effect (c.f. Equations ^ and El). The 
and (T~ of the combined distribution, obtained from 
the total V and 7, will in general not give the right fj, 
unless a location shift A is added. The value of the 
quoted result will shift. 

Recalling section B, for the original distribution one 
could defend quoting the central value as it was the 
median, even though it was not the mean. The con- 
voluted distribution not only has a non-zero mean, it 
also (as can be seen in Figure |31 ) has non-zero me- 
dian. If you want to combine asymmetric errors then 
you have to accept that the quoted value will shift. To 
make this correction requires a real belief in the asym- 
metry of the error values. At this point practitioners, 
unless they are sure that their errors really do have a 
significant asymmetry, may be persuaded to revert to 
quoting symmetric errors. 

Solving the Equations |S1 for , (t+ and xq 
given fi^ V and 7 has to be done numer- 
ically. A program for this is available on 
'http://www . slac . Stanford . edu/~barlow Some 
results are shown in the dotted curve of Figure |21 and 
Table 1. 



Table I Adding errors in Model 1 





a- a+ A 


1.0 1.0 0.8 1.2 
0.8 1.2 0.8 1.2 
0.5 1.5 0.8 1.2 
0.5 1.5 0.5 1.5 


1.32 1.52 0.08 
1.22 1.61 0.16 
1.09 1.78 0.28 
0.97 1.93 0.41 



2.5. Model 1 

For Model 1, for which (x^) = -|=(ct5- - cr^) we 

v27r ' 

have 

1/ = ^2 + a2 (1 - 1 ) 



It is apparent that the dotted curve agrees much 
better with the solid one than the 'usual procedure' 
dashed curve does. It is not an exact match, but 
does an acceptable job given that there are only 3 
adjustable parameters in the function. If the shape 
of the solid curve is to be represented by a dimidated 
Gaussian, then it is plausible that the dotted curve is 
the 'best' such representation. 
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2.6. Model 2 

The equivalent of Equations |H1 are 

^ = X{) + oi 

7 = Gcr^a + 8a^ 



(9) 



As with Method 1, these are used to find the cu- 
mulants of each contributing distribution, which are 
summed to give the three totals, and then Equation|51 
is used again to find the parameters of the distorted 
Gaussian with this mean, variance and skew. The web 
program will also do these calculations 

Some results are shown in Figure 0] and Table ^ 
The true convolution cannot be done analytically but 
can be done by a Monte Carlo calculation. 

Table II Adding errors in Model 2 





a- cr+ A 


1.0 1.0 0.8 1.2 
0.8 1.2 0.8 1.2 
0.5 1.5 0.8 1.2 
0.5 1.5 0.5 1.5 


1.33 1.54 0.10 
1.25 1.64 0.20 

1.12 1.88 0.35 

1.13 2.07 0.53 




Figure 4: Examples of combined errors using Model 2. 

Again the true curves (solid) are not well repro- 
duced by the 'usual procedure' (dashed) but the 
curves with the correct cumulants (dotted) do a good 
job. (The sharp behaviour at the edge of the curves 
is due to the turning point of the parabola.) 

2.7. Evaluating 

For Model 1 the contribution from a discrepancy 
5 is just or 5"^ /a^"^ as appropriate. This is 

manifestly inelegant, especially for minimisation pro- 
cedures as the value goes through zero. 



For Model 2 one has 

8 = (Tu + Aau^ . 



(10) 



This can be considered as a quadratic for u with 
solution which when squared gives w^, the contri- 
bution, as 



2 + 4:A^ 



2(1 + 4y4^)2 



4A2 



(11) 



This is not really exact, in that it only takes one 
branch of the solution, the one approximating to the 
straight line, and does not consider the extra possi- 
bility that the 5 value could come from an improb- 
able u value the other side of the turning point of 
the parabola. Given this imperfection it makes sense 
to expand the square root as a Taylor series, which, 
neglecting correction terms above the second power, 
leads to 



X 



a 



1 



2A(-) + 5^2(^)2 
a a 



(12) 



This provides a sensible form for ^ from asym- 
metric errors. It is important to keep the (5* term 
rather than stopping at 8^ to ensure stays posi- 
tive! Adding higher orders does not have a great ef- 
fect. We recommend it for consideration when it is 
required (e.g. in fitting parton distribution functions) 
to form a from asymmetric errors 



2.8. Weighted means 

The 'best' estimate (i.e. unbiassed and with small- 
est variance) from several measurements xi with dif- 
ferent (symmetric) errors ui is given by a weighted 
sum with Wi — ^/(jf- We wish to find the equivalent 
for asymmetric errors. 

As noted earlier, when sampling from an asymmet- 
ric distribution the result is biassed towards the tail. 
The expectation value {x) is not the location param- 
eter X. So for an unbiassed estimator one must take 



= '^w^{xi - bi)/ y^/ . 



where 

6 = 



/27r 



(Model 1) 



b = 



(13) 

(Model 2) 

(14) 

(15) 



The variance of this is given by 

where Vi is the variance of the i*'* measurement about 
its mean. Differentiating with respect to Wi to find 
the minimum gives 



= 



(16) 
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which is satisfied by — l/K;. This is the equivalent 
of the familiar weie hting by I/ct^. The weights are 
given, depending on the Model, by (see Equations |H1 
andig 

V = cf'^ + {l--)a^ or V^a'^ + 2a^ (17) 

Note that this is not the Maximum Liklelihood es- 
timator - writing down the likelihood in terms of the 
and differentiating does not give a nice form - so 
in principle there may be better estimators, but they 
will not have the simple form of a weighted sum. 



3. Asymmetric Statistical Errors 

As explained earlier, (log) likelihood curves are used 
to obtain the maximum likelihood estimate for a pa- 
rameter and also the 68% central interval - taken as 
the values at which In L falls by ^ from its peak. For 
large N this curve is a parabola, but for finite N it 
is generally asymmetric, and the two points are not 
equidistant about the peak. 

The bias, if any, is not connected to the form of the 
curve, which is a likelihood and not a pdf. Evaluat- 
ing a bias is done by integrating over the measured 
value not the theoretical parameter. We will assume 
for simplicity that these estimates are bias free. This 
means that when combining errors there will be no 
shift of the quoted value. 

3.1. Combining asymmetric statistical 
errors 

Suppose estimates a and h are obtained by this 
method for variables a and h. a could typically be 
an estimate of the total number of events in a sig- 
nal region, and h the (scaled and negated) estimate of 
background, obtained from a sideband. We are inter- 
ested in 7i = a + 6, taking u ^ a -\- h. What are the 
errors to be quoted on u! 

3.2. Likelihood functions known 

We first consider the case where the likelihood func- 
tions La{x\a) and Li,{x\b) are given. 

For the symmetric Gaussian case, the answer is well 
known. Suppose that the likelihoods are both Gaus- 
sian, and further that Ua = = cr. The log likelihood 
term 




can be rewritten 

/ - \ 2 / - 

l/a + fe-(a + 6)\ I I a-b- [a-h) 



so the likelihood is the product of Gaussians for u = 
a + b and v — a — b, with standard deviations V2cr. 

Picking a particular value of v, one can then triv- 
ially construct the 68% confidence region for u as 
[u — \/2a,u + \pla\. Picking another value of w, in- 
deed any other value of w, one obtains the same region 
for u. We can therefore say with 68% confidence that 
these limits enclose the true value of m, whatever the 
value of V. The uninteresting part of a and b has been 
'parametrised away'. This is, of course, the standard 
result from the combination of errors formula, but de- 
rived in a frequentist way using Neyman-style confi- 
dence intervals. We could construct the limits on u by 
finding u-t-cr^ such that the integrated probability of a 
result as small as or smaller than the data be 16%, and 
similarly for (j~ , rather than taking the A In L = — i 
shortcut, and it would not affect the argument. 

The question now is how to generalise this. For this 
to be possible the likelihood must factorise 

L(x|a, b) = L„(x|u)L„(x|u) (20) 

with a suitable choice of the parameter v and the func- 
tions L„ and L^. Then we can use the same argument: 
for any value of v the limits on u are the same, de- 
pending only on L„(a;|w). Because they are true for 
any v they are true for all "u, and thus in general. 

There are cases where this can clearly be done. For 
two Gaussians with Oa 7^ Oh the result is the same 
as above but with v = a(j\ — ba\. For two Poisson 
distributions v is ajb. There are cases (with multiple 
peaks) where it cannot be done, but let us hope that 
these are artificially pathological. 

On the basis that if it cannot be done, the question 
is unanswerable, let us assume that it is possible in the 
case being studied, and see how far we can proceed. 
Finding the form of v is liable to be difficult, and as 
it is not actually used in the answer we would like to 
avoid doing so. The limits on u are read off from the 
A In L(.t|m, w) = — i points where v can have any value 
provided it is fixed. Let us choose v = v, the value 
at the peak. This is the value of v at which Ly(v) 
is a maximum. Hence when we consider any other 
value of u, we can find v — v hy finding the point at 
which the likelihood is a maximum, varying a ~ b, or 
a, or b, or any other combination, always keeping a + b 
fixed. We can read the limits off a 1 dimensional plot 
of hiLmaxix\u), whcrc the 'max' suffix denotes that 
at each value of u we search the subspace to pick out 
the maximum value. 

This generalises to more complicated situations. If 
u = a + b-\-c we again scan the In Lmax (x\ u) function, 
where the subspace is now 2 dimensional. 
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3.3. Likelihood functions not completely 
known 

In many cases the likelihood functions for a and b 
will not be given, merely estimates a and b and their 
asymmetric errors (t+ , cr~ , and a'^ . All we can do 
is to use these to provide best guess functions La{x\a) 
and Lb{x\b). A parametrisation of suitable shapes, 
which for t7+ ~ ct~ approximate to a parabola, must 
be provided. Choosing a suitable parametrisation is 
not trivial. The obvious choice of introducing small 
higher-order terms fails as these dominate far from the 
peak. A likely candidate is: 

lnL(a) = -ifi^iii±^V (21) 



ln/3 



where (3 — /cr_ and 7 = ^^1^ ■ This describes the 



(J - 

-'+- 

usual parabola, but with the x-axis stretched by an 
amount that changes linearly with distance. Figure |31 
shows two illustrative results. The first is the Poisson 



In L 




In L 



^ ln(x) 
Figure 5: Approximations using Equation 1211 

likelihood from 5 observed events (solid line) for which 
the estimate using the AlnL = ^ points is /i = 5^J g2, 
as shown. The dashed line is that obtained inserting 
these numbers into Eauation l21l The second considers 
a measurement of a; = lOOitlO, of which the logarithm 
has been taken, to give a value A.mbto^wl- 

Again, 

the solid line is the true curve and the dashed line 
the parametrisation. In both cases the agreement is 
excellent over the range « ±lcr and reasonable over 
the range « ±3tT. 

To check the correctness of the method we can use 
the combination of two Poisson numbers, for which 
the result is known. First indications are that the 
errors obtained from the parametrisation are indeed 
closer to the true Poisson errors than those obtained 
from the usual technique. 

3.4. Combination of Results 

A related problem is to find the combined estimate 
M given estimates a and b (which have asymmetric 
WEMT002 



errors). Here a and b could be results from differ- 
ent channels or different experiments. This can be 
regarded as a special case, constrained to a = 5, i.e. 
w = 0, but this is rather contrived. It is more direct 
just to say that one uses the log likelihood which is 
the sum of the two separate functions, and determines 
the peak and the A In L — points from that. If the 
functions are known this is unproblematic, if only the 
errors are given then the same parametrisation tech- 
nique can be used. 



4. Conclusions 



If asymmetric errrors cannot be avoided they need 
careful handling. 

A method is suggested and a program provided for 
combining asymmetric systematic errors. It is not 
'rigorously correct' but such perfection is impossible. 
Unlike the usual method, it is at least open about its 
assumptions and mathematically consistent. 

Formulae for and weighted sums are given. 

A method is proposed for combining asymmetric 
statistical errors if the likelihood functions are known. 
Work is in progress to enable it to be used given only 
the results and their errors. 
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